cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

RajuNagarajan
by New Contributor
  • 643 Views
  • 1 replies
  • 0 kudos

GroupBy in a multi node environment

I have a group of rows with Information on a nested product calls. example- Trxn1-product1-caller1-local1 Trxn1-Product1-local1-local2 Trxn1-Product1-local2-local3 here’s is a expected calls for a product product1-caller1-local1 Product1-local1-loc...

  • 643 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ RajuNagarajan! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
lprevost
by New Contributor III
  • 1314 Views
  • 1 replies
  • 0 kudos

Incremental updates to s3 csv files, autoloader, and delta lake updates

I'm using the Databricks autoloader to incrementally load a series of csv files on s3 which I update with an API. My tyipcal work process is to update only the latest year file each night. But, there are ocassions where previous years also get update...

  • 1314 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ lprevost! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
MohitAnchlia
by New Contributor II
  • 1134 Views
  • 1 replies
  • 0 kudos

Accessing databricks from Presto SSQL

What's the best way to federate a query to delta lake or the databricks from presto sql without having to create external tables? PrestoSQL doesn't have access to S3. Can PrestoSQL be configured with jdbc driver or plugin?

  • 1134 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ MohitAnchlia! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
MalachiBunn
by New Contributor
  • 1254 Views
  • 2 replies
  • 0 kudos

Toggle titles to show by default for a user or notebook

I find titles to be useful in organizing my notebooks, but I don't like having to toggle the title display for each cell in order to add a title. Is there a way to toggle the UI to show titles by default for a user/notebook? This would be a good fea...

  • 1254 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi there!I am here to help you.If you want a new feature to be added here , you can request the feature here at this link:-https://docs.databricks.com/resources/ideas.html.

  • 0 kudos
1 More Replies
PrasadGaikwad
by New Contributor
  • 9368 Views
  • 1 replies
  • 0 kudos

TypeError: Column is not iterable when using more than one columns in withColumn()

I am trying to find quarter start date from a date column. I get the expected result when i write it using selectExpr() but when i add the same logic in .withColumn() i get TypeError: Column is not iterableI am using a workaround as follows workarou...

  • 9368 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

 Hi @ PrasadGaikwad! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
haseebkhan1421
by New Contributor
  • 1923 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 1923 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos
maheshwor
by New Contributor III
  • 869 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Views

How do we find the definition of View in databricks?

  • 869 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 2 kudos

You can use the extended table description. For example, the following python code will print the current definition of the view: table_name = "" df = spark.sql("describe table extended {}".format(table_name)) df.createOrReplaceTempView("view_desript...

  • 2 kudos
TimothyClotwort
by New Contributor
  • 3022 Views
  • 1 replies
  • 0 kudos

SQL Alter table command not working for me

I am a novice with databricks. I am performing some independent learning. I am trying to add a column to an existing table. Here is my syntax: %sql ALTER TABLE car_parts ADD COLUMNS (engine_present boolean) which returns the error:SyntaxError: inva...

  • 3022 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Is the table you are working with in the Delta format? The table commands (i.e. Alter) do not work for all storage formats. For example if I run the following commands then I can alter a table. Note - there is no data in the table but the table exist...

  • 0 kudos
rami1
by New Contributor II
  • 1354 Views
  • 1 replies
  • 0 kudos

Missing Databricks Datasets

Hi, I am looking at my Databricks workspace and it looks like I am missing DBFS Databricks-dataset root folder. The dbfs root folders I can view are FileStore, local_disk(),mnt, pipelines and user. Can I mount Databricks-dataset or am I missing some...

  • 1354 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

If you run the following command do you receive an error? Or do you just get an empty list?dbutils.fs.ls("/databricks-datasets")

  • 0 kudos
User16826994223
by Honored Contributor III
  • 863 Views
  • 1 replies
  • 0 kudos

The State in-stream is growing too large in stream

I have a customer with a streaming pipeline from Kafka to Delta. They are leveraging RocksDB, watermarking for 30min and attempting to dropDuplicates. They are seeing their state grow to 6.2 billion rows--- on a stream that hits at maximum 7000 rows ...

  • 863 Views
  • 1 replies
  • 0 kudos
Latest Reply
shaines
New Contributor II
  • 0 kudos

I've seen a similar issue with large state using flatMapGroupsWithState. It is possible that A.) they are not using the state.setTimeout correctly or B.) they are not calling state.remove() when the stored state has timed out, leaving the state to gr...

  • 0 kudos
PadamTripathi
by New Contributor II
  • 4686 Views
  • 2 replies
  • 1 kudos

how to calculate median on azure databricks delta table using sql

how to calculate median on delta tables in azure databricks using sql ? select col1, col2, col3, median(col5) from delta table group by col1, col2, col3

  • 4686 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

try with the percentile function, as median = percentile 50: https://spark.apache.org/docs/latest/api/sql/#percentile

  • 1 kudos
1 More Replies
AlexDavies
by Contributor
  • 1181 Views
  • 1 replies
  • 0 kudos

Genrated partition column not being used by optimizer

We have created a table using the new generated column feature (https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#deltausegeneratedcolumns) CREATE TABLE ingest.MyEvent( data binary, topic string, timestamp timestamp, date dat...

  • 1181 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I think you have to pass a date in your select query instead of a timestamp.The generated column will indeed derive a data from the timestamp and partition by it. But the docs state: When you write to a table with generated columns and you do not ex...

  • 0 kudos
irfanaziz
by Contributor II
  • 992 Views
  • 1 replies
  • 1 kudos

What could be the issue with parquet file?

when trying to update or display the dataframe, one of the parquet files is having some issue, "Parquet column cannot be converted. Expected: DecimalType(38,18), Found: DOUBLE" What could be the issue?

  • 992 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

try to explicitly cast the double column to dec(38,18) and then do the display.

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels