cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

shadowinc
by New Contributor III
  • 1336 Views
  • 2 replies
  • 2 kudos

Resolved! spark/databricks temporary views and uuid

Hi All,We have a table which has an id column generated by uuid(). For ETL we use databricks/spark sql temporary views. we observed strange behavior between databricks sql temp view (create or replace temporary view) and spark sql temp view (df.creat...

Data Engineering
Databricks SQL
spark sql
temporary views
uuid
  • 1336 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @shadowinc, Creation of Temporary Views: Databricks SQL: When you create a temporary view using Databricks SQL, it’s scoped to the cluster and remains accessible until the cluster restarts or you explicitly drop it1.Spark SQL: In contrast, Spa...

  • 2 kudos
1 More Replies
NataliaCh
by New Contributor
  • 1125 Views
  • 1 replies
  • 0 kudos

Delta table cannot be reached with INTERNAL_ERROR

Hi all!I've been dropping and recreating delta tables at the new location. For one table something went wrong and now I cannot nor DROP nor recreate it. It is visible in catalog, however, when I click on the table I see message: [INTERNAL_ERROR] The ...

  • 1125 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

I’m sorry to hear that you’re encountering this issue with your Delta table. Ensure that you are using a compatible version of Spark and its associated plugins. Sometimes, upgrading or downgrading Spark can resolve issues related to internal error...

  • 0 kudos
JOFinancial
by New Contributor
  • 745 Views
  • 2 replies
  • 0 kudos

No Data for External Table from Blob Storage

Hi All,I am trying to create an external table from a Azure Blob storage container.  I recieve no errors, but there is no data in the table.  The Blob Storage contains 4 csv files with the same columns and about 10k rows of data.  Am I missing someth...

  • 745 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @JOFinancial,  Make sure that the CSV files in your Azure Blob storage container use the correct delimiter (usually a comma) and have consistent column names. Any variation in the column names or the delimiter could cause issues.Verify that the fo...

  • 0 kudos
1 More Replies
ijw
by New Contributor
  • 388 Views
  • 1 replies
  • 0 kudos

Databricks Serverless Compute

Does Databricks support SQL Queries to extract data from Rest APIs?

  • 388 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ijw,  Recently, Databricks introduced the Databricks SQL Statement Execution API, which allows you to connect to your Databricks SQL warehouse over a REST API and access and manipulate data managed by the Databricks Lakehouse Platform.

  • 0 kudos
AmnBrt
by New Contributor
  • 752 Views
  • 1 replies
  • 0 kudos

"Databricks Accredited Lakehouse Fundamentals" Badge not received.

Hello, so today I watched the tutorial videos and passed the knowledge test as requested to earn the "Databricks Accredited Lakehouse Fundamentals" Badge. Instead I received the "Certificate of Completion of Fundamentals of the Databricks Lakehouse P...

  • 752 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @AmnBrt, Thank you for posting your concern on Community!   To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

  • 0 kudos
Dhruv-22
by New Contributor III
  • 428 Views
  • 1 replies
  • 0 kudos

NamedStruct fails in the 'IN' query

I've posted the same question on stackoverflow (link) as well. I will post any solution I get there.I was trying to understand using many columns in the IN query and came across this statement. SELECT (1, 2) IN (SELECT c1, c2 FROM VALUES(1, 2), (3, 4...

  • 428 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Dhruv-22,  The IN operator expects a subquery to return a single column. When you use a subquery with IN, it compares the value from the left side (in your case, the named struct) with the values returned by the subquery on the right side. I...

  • 0 kudos
AnkithP
by New Contributor
  • 403 Views
  • 1 replies
  • 0 kudos

Datatype changed while writing in delta format

Hello team,I'm encountering an issue with my batch processing job. Initially, I write the job in overwrite mode with overwrite schema set to true. However, when I attempt to write the next batch in append mode, it fails due to a change in the datatyp...

  • 403 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @AnkithP,  When reading data in overwrite mode, use schema inference to automatically detect the column datatypes. Most batch processing frameworks (e.g., Apache Sparkâ„¢, Apache Flink) provide this feature.Store the inferred schema (e.g., as a JSON...

  • 0 kudos
Maatari
by New Contributor III
  • 856 Views
  • 1 replies
  • 1 kudos

Resolved! DataBricks Auto loader vs input source files deletion detection

Hi, While ingesting files from a source folder continuously, I would like to be able to detect the case where files are being deleted. As far as I can tell the Autoloader can not handle the detection of files deleted in the source folder. Hence the c...

  • 856 Views
  • 1 replies
  • 1 kudos
Latest Reply
Yeshwanth
Honored Contributor
  • 1 kudos

@Maatari Yes, it is true that Autoloader in Databricks cannot detect the deletion of files in the source folder during continuous ingestion. The Autoloader is designed to process files exactly once unless the option "cloudFiles.allowOverwrites" is en...

  • 1 kudos
chardv
by New Contributor II
  • 832 Views
  • 2 replies
  • 0 kudos

Lakehouse Federation Multi-User Authorization

Since Lakehouse Fed uses only one credential per connection to the foreign database, all queries using the connection will see all the data the credentials has to access to. Would anyone know if Lakehouse Fed will support authorization using the cred...

  • 832 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yeshwanth
Honored Contributor
  • 0 kudos

@chardv, good day! Could you please share more details and the documentation [if you have referred any]?

  • 0 kudos
1 More Replies
as999
by New Contributor III
  • 10788 Views
  • 8 replies
  • 6 kudos

Databrick hive metastore location?

In databrick, where is hive metastore location is it control plane or data plane? for prod systems In terms of security what preventions should be taken to secure hive metastore?

  • 10788 Views
  • 8 replies
  • 6 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 6 kudos

@as999​ The default metastore is managed by Databricks. If you are concerned about security and would like to have your own metastore you can go for the external metastore setup. You have the details steps in the below doc for setting up the external...

  • 6 kudos
7 More Replies
MarkusFra
by New Contributor III
  • 3224 Views
  • 3 replies
  • 1 kudos

Re-establish SparkSession using Databricks connect after cluster restart

Hello,when developing locally using Databricks connect how do I re-establish the SparkSession when the Cluster restarted? getOrCreate() seems to get the old invalid SparkSession even after Cluster restart instead of creating a new one or am I missing...

Data Engineering
databricks-connect
  • 3224 Views
  • 3 replies
  • 1 kudos
Latest Reply
Michael_Chein
New Contributor II
  • 1 kudos

If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel. 

  • 1 kudos
2 More Replies
prabhu26
by New Contributor
  • 653 Views
  • 1 replies
  • 0 kudos

Unable to enforce schema on data read from jsonl file in Azure Databricks using pyspark

I'm tring to build a ETL pipeline in which I'm reading the jsonl files from the azure blob storage, then trying to transform and load it to delta tables in databricks. I have created the below schema for loading my data :  schema = StructType([ S...

  • 653 Views
  • 1 replies
  • 0 kudos
Latest Reply
DataEngineer
New Contributor II
  • 0 kudos

Try this.Add option("multiline","true")

  • 0 kudos
MarkD
by New Contributor II
  • 2437 Views
  • 8 replies
  • 0 kudos

SET configuration in SQL DLT pipeline does not work

Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...

Data Engineering
Delta Live Tables
dlt
sql
  • 2437 Views
  • 8 replies
  • 0 kudos
Latest Reply
Hkesharwani
Contributor II
  • 0 kudos

Hi @MarkD ,You may use  set variable_name.var= '1900-01-01'to set the value of variable and in order to use the value of variable use ${automated_date.var} Example: set automated_date.var= '1800-01-01' select * from my table where date = CAST(${autom...

  • 0 kudos
7 More Replies
pshuk
by New Contributor III
  • 836 Views
  • 2 replies
  • 1 kudos

upload file/table to delta table using CLI

Hi,I am using CLI to transfer local files to Databricks Volume. At the end of my upload, I want to create a meta table (storing file name, location, and some other information) and have it as a table on databricks Volume. I am not sure how to create ...

  • 836 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 1 kudos

Hi @pshuk , Greetings!  We understand that you are looking for a CLI command to create a Table but at this moment Databricks doesn't support CLI command to create the table but you can use SQL Execution API -https://docs.databricks.com/api/workspace/...

  • 1 kudos
1 More Replies
dbal
by New Contributor III
  • 1107 Views
  • 2 replies
  • 0 kudos

withColumnRenamed does not work with databricks-connect 14.3.0

I am not able to run our unit tests suite due a possible bug in the databricks-connect library. The problem is with the Dataframe transformation withColumnRenamed. When I run it in a Databricks cluster (Databricks Runtime 14.3 LTS), the column is ren...

dbal_3-1715382511871.png dbal_4-1715382516217.png dbal_1-1715383269610.png
  • 1107 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@dbal - can you please try withColumnsRenamed() instead Reference: https://docs.databricks.com/en/release-notes/dbconnect/index.html#databricks-connect-1430-python

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels