cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jennifer
by New Contributor III
  • 777 Views
  • 1 replies
  • 0 kudos

Support for dataskipping for type TimestampNTZ

More people begin to use TimestampNTZ as cluster key.According to the thread here Unsupported datatype 'TimestampNTZType' with liquid clustering , optimization is not supported yet. We use this type as cluster key in Production already and can't opti...

  • 777 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jennifer
New Contributor III
  • 0 kudos

Also, does it mean that even I specify a column of type TimestampNTZ in the clustering key, it is not clustered by this column?

  • 0 kudos
anonymous_567
by New Contributor II
  • 994 Views
  • 1 replies
  • 0 kudos

Autoloader ingestion same top level directory different files corresponding to different tables

Hello, Currently I have files landing in a storage account. They are all located in subfolders of a common directory. Some subdirectories may contain files, others may not. Each file name is unique and corresponds to a unique table as well. No two fi...

  • 994 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Read all the files using auto loader and add an additional column as follows:.withColumn("filePath",input_file_name())Now that you've file name, you can split the data frame as per your requirement and ingest data into different tables.

  • 0 kudos
InTimetec
by New Contributor II
  • 3393 Views
  • 4 replies
  • 1 kudos

Unable to connect mongo with Databricks

Hello,I am trying to connect mongo with Databricks. I also used SSL certificate.I created my own cluster and installed maven library org.mongodb.spark:mongo-spark-connector_2.12:3.0.1.This is my code: connection_string =f"mongodb://{secret['user']}:{...

InTimetec_0-1712295715248.png
  • 3393 Views
  • 4 replies
  • 1 kudos
Latest Reply
InTimetec
New Contributor II
  • 1 kudos

@Retired_mod I updated my code as below: df = spark.read.format("com.mongodb.spark.sql.DefaultSource")\ .option("database", database)\ .option("collection", collection)\ .option("spark.mongodb.input.uri", connectionString)\ ...

  • 1 kudos
3 More Replies
arunak
by New Contributor
  • 1973 Views
  • 1 replies
  • 0 kudos

Connecting to Serverless Redshift from a Databricks Notebook

Hello Experts, A new databricks user here. I am trying to access an Redshift serverless table using a databricks notebook. Here is what happens when I try the below code,  df = spark.read.format("redshift")\.option("dbtable", "public.customer")\.opti...

  • 1973 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@arunak - we need to specify forward_spark_s3_credentials to true during read. This will help spark detect the credentials used to authenticate to the S3 bucket and use these credentials to r read from redshift.  

  • 0 kudos
mh_db
by New Contributor III
  • 3445 Views
  • 1 replies
  • 0 kudos

Write to csv file in S3 bucket

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save itimport boto3import s3fsdf_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)but I keep getting thi...

  • 3445 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you ...

  • 0 kudos
naveenanto
by New Contributor III
  • 1405 Views
  • 0 replies
  • 0 kudos

Custom Spark Extension in SQL Warehouse

I understand only a limited spark configurations are supported in SQL Warehouse but is it possible to add spark extensions to SQL Warehouse clusters?Use Case: We've a few restricted table properties. We prevent that with spark extensions installed in...

Data Engineering
sql-warehouse
  • 1405 Views
  • 0 replies
  • 0 kudos
juanc
by New Contributor II
  • 6543 Views
  • 8 replies
  • 2 kudos

Activate spark extensions on SQL Endpoints

It would be possible to activate a custom extensions like Sedona (https://sedona.apache.org/download/databricks/ ) in SQL Endopoints?Example error:java.lang.ClassNotFoundException: org.apache.spark.sql.sedona_sql.UDT.GeometryUDT at org.apache.spark....

  • 6543 Views
  • 8 replies
  • 2 kudos
Latest Reply
naveenanto
New Contributor III
  • 2 kudos

@Retired_mod What is the right way to add custom spark extension to sql warehouse clusters?

  • 2 kudos
7 More Replies
marcuskw
by Contributor II
  • 14468 Views
  • 1 replies
  • 0 kudos

Resolved! Lakehouse Federation for SQL Server and Security Policy

We've been able to setup a Foreign Catalog using the following documentation:https://learn.microsoft.com/en-us/azure/databricks/query-federation/sql-serverHowever the tables that have RLS using a Security Policy appear empty. I imagine that this solu...

  • 14468 Views
  • 1 replies
  • 0 kudos
Latest Reply
marcuskw
Contributor II
  • 0 kudos

Was a bit quick here, found out that the SUSER_NAME() of the query is of course the connection that was setup.So the User/Password defined here:Once I added that same user to the RLS logic I get the correct result. 

  • 0 kudos
64883
by New Contributor
  • 1475 Views
  • 1 replies
  • 0 kudos

Support for Delta tables multicluster writes in Databricks cluster

Hello, We're using Databricks on AWS and we've recently started using Delta tables. We're using R.While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error: java.lang.IllegalStateExce...

  • 1475 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Sorry, for being very late here -  If you can not use  multi write to false, we can try to split this table into separate tables for each stream.

  • 0 kudos
_Raju
by New Contributor II
  • 4868 Views
  • 1 replies
  • 0 kudos

Cast string to decimal

Hello, can anyone help me with the below error.I'm trying to cast the string column into decimal. When I try to do that I'm getting the "Py4JJavaError: An error occurred while calling t.addCustomDisplayData. : java.sql.SQLException: Status of query a...

  • 4868 Views
  • 1 replies
  • 0 kudos
BeginnerBob
by New Contributor III
  • 44074 Views
  • 6 replies
  • 4 kudos

Resolved! Convert Date to YYYYMMDD in databricks sql

Hi,I have a date column in a delta table called ADate. I need this in the format YYYYMMDD.In TSQL this is easy. However, I can't seem to be able to do this without splitting the YEAR, MONTH and Day and concatenating them together.Any ideas?

  • 44074 Views
  • 6 replies
  • 4 kudos
Latest Reply
JayDoubleYou42
New Contributor II
  • 4 kudos

I'll share I'm having a variant of the same issue. I have a varchar field in the form YYYYMMDD which I'm trying to join to another varchar field from another table in the form of MM/DD/YYYY. Does anyone know of a way to do this in SPARK SQL without s...

  • 4 kudos
5 More Replies
lindsey
by New Contributor
  • 2218 Views
  • 0 replies
  • 0 kudos

"Error: cannot read mws credentials: invalid Databricks Account configuration" on TF Destroy

I have a terraform project that creates a workspace in Databricks, assigns it to an existing metastore, then creates external location/storage credential/catalog. The apply works and all expected resources are created. However, without touching any r...

  • 2218 Views
  • 0 replies
  • 0 kudos
akisugi
by New Contributor III
  • 8000 Views
  • 5 replies
  • 0 kudos

Resolved! Is it possible to control the ordering of the array values created by array_agg()?

Hi! I would be glad to ask you some questions.I have the following data. I would like to get this kind of result. I want `move` to correspond to the order of `hist`.Therefore, i considered the following query.```with tmp as (select * from (values(1, ...

スクリーンショット 2024-04-06 23.08.15.png スクリーンショット 2024-04-06 23.07.34.png
  • 8000 Views
  • 5 replies
  • 0 kudos
Latest Reply
akisugi
New Contributor III
  • 0 kudos

Hi @ThomazRossito This is a great idea. It can solve my problem.Thank you.

  • 0 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels