cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pragarwal
by Databricks Partner
  • 4292 Views
  • 2 replies
  • 0 kudos

Export Users and Groups from Unity Catalog

Hi,I am trying to export the list of users and groups from Unity catalog through databricks workspace but i am seeing only the users/groups created inside the workspace instead of the groups and users coming through scim in unity catalog.How can i ge...

  • 4292 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello when you refer to the users and groups in Unity Catalog, do you refer to the ones created at the Account Level?If this is the case you need to run the API call at the account level and not workspace level, you can see the API doc for account le...

  • 0 kudos
1 More Replies
Jorge3
by New Contributor III
  • 3254 Views
  • 1 replies
  • 0 kudos

Trigger a job on file update

I'm using AutoLoader to process any new file or update that arrives to my landing area. And then I schedule the job using DB workflows to trigger on file arrival. The issue is that the trigger only executes when new files arrive, not when an exiting ...

  • 3254 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ivan_Donev
New Contributor III
  • 0 kudos

I don't think you can effectively achieve your goal. While it's theoretically somewhat possible, Databricks documentation says there is no guarantee for correctness - Auto Loader FAQ | Databricks on AWS

  • 0 kudos
Anonymous
by Not applicable
  • 10427 Views
  • 2 replies
  • 1 kudos

When reading a csv file with Spark.read, the data is not loading in the appropriate column while pas

I am trying to read a csv file from storage location using spark.read function. Also, i am explicitly passing the schema to the function. However, the data is not loading in proper column of the dataframe. Following are the code details:from pyspark....

  • 10427 Views
  • 2 replies
  • 1 kudos
Latest Reply
sai_sathya
New Contributor III
  • 1 kudos

Hi , i would suggest to approach as suggested by Thomaz Rossito,but maybe you can give it as an try like swapping the struct field order like this followingschema = StructType([StructField('DA_RATE', DateType(), True),StructField('CURNCY_F', StringTy...

  • 1 kudos
1 More Replies
dvmentalmadess
by Valued Contributor
  • 8582 Views
  • 3 replies
  • 0 kudos

Resolved! OPTIMIZE: Exception thrown in awaitResult: / by zero

We run `OPTIMIZE` on our tables every 24 hours as follows:spark.sql(f'OPTIMIZE {catalog_name}.{schema_name}.`{table_name}`;') This morning one of our hourly jobs started failing on the call to `OPTIMIZE` with the error:org.apache.spark.SparkException...

  • 8582 Views
  • 3 replies
  • 0 kudos
Latest Reply
sh
New Contributor II
  • 0 kudos

I am getting same error. Any resolution

  • 0 kudos
2 More Replies
ksenija
by Contributor
  • 7687 Views
  • 1 replies
  • 1 kudos

Resolved! Cluster pools

Could you help me understand pools? How to know the difference in pricing between running clusters and running clusters with a pool? Since we're saving time to start/stop the cluster when we have a pool. And should we keep Min Idle above 0 or equal t...

  • 7687 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Databricks pools are a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the ins...

  • 1 kudos
drag7ter
by Contributor
  • 3713 Views
  • 2 replies
  • 0 kudos

Resolved! How to enable CDF when saveAsTable from pyspark code?

I'm running this code in databricks notebook and I want the table from dataframe in catalog were created with CDF enables. When I run the code table hasn't exited yet.This code doesn't create a table with enables CDF. It doesn't add:delta.enableChang...

  • 3713 Views
  • 2 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

Hello @drag7ter ,I don't see anything wrong with your approach, check my repro:    

  • 0 kudos
1 More Replies
nyehia
by Databricks Partner
  • 10046 Views
  • 9 replies
  • 0 kudos

Can not access a sql file from Notebook

Hey,I have a repo of notebooks and SQL files, the typical way is to update/create notebooks in the repo then push it and CICD pipeline deploys the notebooks to the Shared workspace.the issue is that I can access the SQL files in the Repo but can not ...

tempsnip
  • 10046 Views
  • 9 replies
  • 0 kudos
Latest Reply
ok_1
New Contributor II
  • 0 kudos

ok

  • 0 kudos
8 More Replies
Deepikamani
by New Contributor
  • 4857 Views
  • 1 replies
  • 0 kudos

Exam vochure

Hii I am planning to take Databricks certified data engineer assosciate certification. where can i get the exam vochure.

  • 4857 Views
  • 1 replies
  • 0 kudos
Latest Reply
TPSteve
New Contributor II
  • 0 kudos

The Help Center provides an additional forum for this topic. You can request a voucher by submitting a help request, however, vouchers are not provided in all cases. Other ways to obtain a voucher are participation in training events held throughout ...

  • 0 kudos
cszczotka
by New Contributor III
  • 2811 Views
  • 3 replies
  • 1 kudos

Not able to create table shallow clone on DBR 15.0

Hi,I'm getting below error when I'm trying to create table shallow clone on my DBR 15.0.[CANNOT_SHALLOW_CLONE_NON_UC_MANAGED_TABLE_AS_SOURCE_OR_TARGET] Shallow clone is only supported for the MANAGED table type. The table xxx_clone is not MANAGED tab...

  • 2811 Views
  • 3 replies
  • 1 kudos
Latest Reply
cszczotka
New Contributor III
  • 1 kudos

Hi,Source table is external table in UC and  result table should be also external. I'm running such command CREATE TABLE target_catalog.target_schema.table_clone SHALLOW CLONE source_catalog.source_schema.source_table but this for some reason doesn't...

  • 1 kudos
2 More Replies
Sandesh87
by New Contributor III
  • 7573 Views
  • 3 replies
  • 2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3  object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

  • 7573 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Sandesh Puligundla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
2 More Replies
surband
by New Contributor III
  • 8561 Views
  • 7 replies
  • 1 kudos

Resolved! Failures Streaming data to Pulsar

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)1 Driver64 GB...

  • 8561 Views
  • 7 replies
  • 1 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 1 kudos

Hi @surband  - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the clust...

  • 1 kudos
6 More Replies
Jennifer
by New Contributor III
  • 1302 Views
  • 1 replies
  • 0 kudos

Support for dataskipping for type TimestampNTZ

More people begin to use TimestampNTZ as cluster key.According to the thread here Unsupported datatype 'TimestampNTZType' with liquid clustering , optimization is not supported yet. We use this type as cluster key in Production already and can't opti...

  • 1302 Views
  • 1 replies
  • 0 kudos
Latest Reply
Jennifer
New Contributor III
  • 0 kudos

Also, does it mean that even I specify a column of type TimestampNTZ in the clustering key, it is not clustered by this column?

  • 0 kudos
anonymous_567
by New Contributor II
  • 1582 Views
  • 1 replies
  • 0 kudos

Autoloader ingestion same top level directory different files corresponding to different tables

Hello, Currently I have files landing in a storage account. They are all located in subfolders of a common directory. Some subdirectories may contain files, others may not. Each file name is unique and corresponds to a unique table as well. No two fi...

  • 1582 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Read all the files using auto loader and add an additional column as follows:.withColumn("filePath",input_file_name())Now that you've file name, you can split the data frame as per your requirement and ingest data into different tables.

  • 0 kudos
InTimetec
by New Contributor II
  • 5067 Views
  • 4 replies
  • 1 kudos

Unable to connect mongo with Databricks

Hello,I am trying to connect mongo with Databricks. I also used SSL certificate.I created my own cluster and installed maven library org.mongodb.spark:mongo-spark-connector_2.12:3.0.1.This is my code: connection_string =f"mongodb://{secret['user']}:{...

InTimetec_0-1712295715248.png
  • 5067 Views
  • 4 replies
  • 1 kudos
Latest Reply
InTimetec
New Contributor II
  • 1 kudos

@Retired_mod I updated my code as below: df = spark.read.format("com.mongodb.spark.sql.DefaultSource")\ .option("database", database)\ .option("collection", collection)\ .option("spark.mongodb.input.uri", connectionString)\ ...

  • 1 kudos
3 More Replies
arunak
by New Contributor
  • 2710 Views
  • 1 replies
  • 0 kudos

Connecting to Serverless Redshift from a Databricks Notebook

Hello Experts, A new databricks user here. I am trying to access an Redshift serverless table using a databricks notebook. Here is what happens when I try the below code,  df = spark.read.format("redshift")\.option("dbtable", "public.customer")\.opti...

  • 2710 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@arunak - we need to specify forward_spark_s3_credentials to true during read. This will help spark detect the credentials used to authenticate to the S3 bucket and use these credentials to r read from redshift.  

  • 0 kudos
Labels