cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NickGoodfella
by New Contributor
  • 1763 Views
  • 1 replies
  • 1 kudos

DNS_Analytics Notebook Problems

Hello everyone! First post on the forums, been stuck at this for awhile now and cannot seem to understand why this is happening. Basically, I have been using a seems to be premade Databricks notebook from Databricks themselves for a DNS Analytics exa...

  • 1763 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

@NickGoodfella​ , What's the notebook you're looking at, this one? https://databricks.com/notebooks/dns-analytics.html Are you sure all the previous cells executed? this is suggesting there isn't a model at the path that's expected. You can take a lo...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1175 Views
  • 1 replies
  • 0 kudos

The State in-stream is growing too large in stream

I have a customer with a streaming pipeline from Kafka to Delta. They are leveraging RocksDB, watermarking for 30min and attempting to dropDuplicates. They are seeing their state grow to 6.2 billion rows--- on a stream that hits at maximum 7000 rows ...

  • 1175 Views
  • 1 replies
  • 0 kudos
Latest Reply
shaines
New Contributor II
  • 0 kudos

I've seen a similar issue with large state using flatMapGroupsWithState. It is possible that A.) they are not using the state.setTimeout correctly or B.) they are not calling state.remove() when the stored state has timed out, leaving the state to gr...

  • 0 kudos
PadamTripathi
by New Contributor II
  • 5515 Views
  • 2 replies
  • 1 kudos

how to calculate median on azure databricks delta table using sql

how to calculate median on delta tables in azure databricks using sql ? select col1, col2, col3, median(col5) from delta table group by col1, col2, col3

  • 5515 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

try with the percentile function, as median = percentile 50: https://spark.apache.org/docs/latest/api/sql/#percentile

  • 1 kudos
1 More Replies
AlexDavies
by Contributor
  • 2253 Views
  • 1 replies
  • 0 kudos

Genrated partition column not being used by optimizer

We have created a table using the new generated column feature (https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#deltausegeneratedcolumns) CREATE TABLE ingest.MyEvent( data binary, topic string, timestamp timestamp, date dat...

  • 2253 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I think you have to pass a date in your select query instead of a timestamp.The generated column will indeed derive a data from the timestamp and partition by it. But the docs state: When you write to a table with generated columns and you do not ex...

  • 0 kudos
irfanaziz
by Contributor II
  • 1432 Views
  • 1 replies
  • 1 kudos

What could be the issue with parquet file?

when trying to update or display the dataframe, one of the parquet files is having some issue, "Parquet column cannot be converted. Expected: DecimalType(38,18), Found: DOUBLE" What could be the issue?

  • 1432 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

try to explicitly cast the double column to dec(38,18) and then do the display.

  • 1 kudos
dbsuersu
by New Contributor II
  • 1527 Views
  • 1 replies
  • 0 kudos

"dbfs:" quote added as a prefix to file path

There is a mount path /mnt/folder I am passing filename as a variable from another function and completing the path variable as follows: filename=file.txt path=/mnt/folder/subfolder/+filename When I'm trying to use the path variable is a function, f...

  • 1527 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

databricks uses the databricks file system (dbfs) by default. So my guess is you did not mount the path in databricks.

  • 0 kudos
reza-eghbali
by New Contributor
  • 1324 Views
  • 0 replies
  • 0 kudos

Kafka consumer and a web server simultaneously, thread blocking problem in microservice

assumptions: There are microservices behind an api-gateway, they communicate through HTTP synchronously. obviously, each one of those microservices is a web server. now I want my microservice to play as a Kafka producer and "consumer" too. more clea...

  • 1324 Views
  • 0 replies
  • 0 kudos
JoãoRafael
by New Contributor II
  • 3149 Views
  • 3 replies
  • 0 kudos

Double job execution caused by databricks' RemoteServiceExec using databricks-connector

Hello! I'm using databricks-connector to launch spark jobs using python. I've validated that the python version (3.8.10) and runtime version (8.1) are supported by the installed databricks-connect (8.1.10). Everytime a mapPartitions/foreachParti...

0693f000007OoMBAA0 0693f000007OoMAAA0
  • 3149 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

A community forum to discuss working with Databricks Cloud and Spark. ... Double job execution caused by databricks' RemoteServiceExec using databrick.MyBalanceNow

  • 0 kudos
2 More Replies
Kotofosonline
by New Contributor III
  • 1200 Views
  • 1 replies
  • 0 kudos

Bug Report: Date type with year less than 1000 (years 1-999) in spark sql where [solved]

Hi, I noticed unexpected behavior for Date type. If year value is less then 1000 then filtering do not work. Steps:create table test (date Date); insert into test values ('0001-01-01'); select * from test where date = '0001-01-01' Returns 0 rows....

  • 1200 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 0 kudos

Hm, seems to work now.

  • 0 kudos
max651
by New Contributor
  • 1316 Views
  • 0 replies
  • 0 kudos

How to create circles and find diameter in point cloud?

Good day, Everybody. I have a task I should create many circles in a point cloud (my file can be .csv) and calculate their diameter in the area 100m x 100 m. I have the coordinates of the starting point. The circles should be created at the height ...

0693f000007OoQoAAK
  • 1316 Views
  • 0 replies
  • 0 kudos
Lon_Fortes
by New Contributor III
  • 6830 Views
  • 2 replies
  • 1 kudos

Resolved! How can I check that column on a delta table has a "NOT NULL" constraint or not?

Title pretty much says it all - I'm trying to determine whether or not a column on my existing delta table was defined as NOT NULL or not. It does not show up in any of the metadata (describe detail, describe history, show tblproperties). Thanks in...

  • 6830 Views
  • 2 replies
  • 1 kudos
Latest Reply
Matthew8
New Contributor II
  • 1 kudos

walgreenslistens Wrote:A UNIQUE constraint defines a set of columns that uniquely identify rows in a table only if all the key values are not NULL. If one or more key parts are NULL, duplicate keys are allowed.

  • 1 kudos
1 More Replies
User16857281869
by New Contributor II
  • 846 Views
  • 1 replies
  • 0 kudos

We want to do demand forecasting for our supply chain. How should we benefit from Spark in the Usercase development?

We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.

  • 846 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.

  • 0 kudos
irfanaziz
by Contributor II
  • 2159 Views
  • 1 replies
  • 1 kudos

Resolved! How to keep the original Swedish/Finnish character in the file?

The files are in ANSI format as it shows in the notepad, I could manuelly convert the files to utf8 and read it. But the files are really large. I dont want to download and upload the files. Is there way so i could keep the swedish/finnish characte...

  • 2159 Views
  • 1 replies
  • 1 kudos
Latest Reply
irfanaziz
Contributor II
  • 1 kudos

So the answer was using the option("charset", "iso-8859-1")

  • 1 kudos
SarahDorich
by New Contributor II
  • 3308 Views
  • 3 replies
  • 0 kudos

How to register datasets for Detectron2

I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated. 1) Specifying full p...

  • 3308 Views
  • 3 replies
  • 0 kudos
Latest Reply
Thurman
New Contributor II
  • 0 kudos

Register your dataset Optionally, register metadata for your dataset.

  • 0 kudos
2 More Replies
hmcdowelle
by New Contributor II
  • 13887 Views
  • 18 replies
  • 0 kudos

i just can't seem to make a cluster without error

I have been trying to create a cluster and this is my first time using databricks. I have tried across multiple resources and am getting frustrated. Each time the cluster comes up with an error. I have no idea what I am doing wrong. I use default se...

  • 13887 Views
  • 18 replies
  • 0 kudos
Latest Reply
kwayebgh
New Contributor II
  • 0 kudos

I have a Free Azure student account and I was facing similar challenges. This is how I solved mine after many hours of trial and error. Mine is working now. When creating the Azure Databricks resource group: Select Premium. Don't use the 14 day tri...

  • 0 kudos
17 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels